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Szekely, Rizzo and Bakirov (2007) and Szekely and Rizzo (2009), in two sem- 
<3\ inal papers, introduced the powerful concept of distance correlation as a measure 

of dependence between sets of random variables. We study in this paper an 
f-H affinely invariant version of the distance correlation and an empirical version of 

C/3 that distance correlation, and we establish the consistency of the empirical quan- 

4^ tity. In the case of subvectors of a multivariate normally distributed random 

vector, we provide exact expressions for the distance correlation in both finite- 
dimensional and asymptotic settings. To illustrate our results, we consider time 
'— 1 series of wind vectors at the Stateline wind energy center in Oregon and Wash- 

ington, and we derive the empirical auto and cross distance correlation functions 
>- between wind vectors at distinct meteorological stations. 
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1 Introduction 



Szekely, Rizzo and Bakirov (2007) and Szekely and Rizzo (2009), in two seminal papers, 
introduced the distance covariance and distance correlation as powerful measures of 
dependence. Contrary to the classical Pearson correlation coefficient, the population 
distance covariance vanishes only in the case of independence, and it applies to random 
vectors of arbitrary dimensions, rather than to univariate quantities only. 

*Institut fur Angewandte Mathematik, Universitat Heidelberg, Im Neuenheimer Feld 294, 69120 
Heidelberg, Germany. 

^Department of Statistics, Pennsylvania State University, University Park, PA 16802, U.S.A. 



1 



2 



Dueck, Edelmann, Gneiting, and Richards 



As noted by Newton (2009), the "distance covariance not only provides a bona fide 
dependence measure, but it does so with a simplicity to satisfy Don Geman's elevator 
test (i.e., a method must be sufficiently simple that it can be explained to a colleague 
in the time it takes to go between floors on an elevator!)." In the case of the sample 
distance covariance, find the pairwise distances between the sample values for the first 
variable, and center the resulting distance matrix; then do the same for the second 
variable. The square of the sample distance covariance equals the average entry in 
the componentwise or Schur product of the two centered distance matrices. Given the 
theoretical appeal of the population quantity, and the striking simplicity of the sample 
version, it is not surprising that the distance covariance is experiencing a wealth of 
applications, despite having been introduced merely half a decade ago. 

Specifically, let p and q be positive integers. For column vectors s6i p and t G M 9 , 
denote by \s\ p and \t\ q the standard Euclidean norms on the corresponding spaces; thus, 
if s = (si, . . . ,s p ) then 

\s\ P = (sl + ... + 4)^, 

and similarly for \t\ q . For vectors u and v of the same dimension, p, we let (u, v) p be the 
standard Euclidean scalar product of u and v. For jointly distributed random vectors 
X G W and Y G R q , let 

/x,y(M) =Eexp[i ( s ,X) p + i(t,Y) q ] 

be the joint characteristic function of (X,Y), and let fx(s) = fx,y{s,0) and fy(t) = 
/x,y(0,£) be the marginal characteristic functions of X and Y, where s G W and 
t G M. q . Szekely, et al. (2007) introduced the distance covariance between X and Y as 
the nonnegative number V(X, Y) defined by 

nXiY) = _Lf (!.!) 

c p c q JRP+1 \S\p \t\q 

where \z\ denotes the modulus of z G C and 

The distance correlation between X and Y is the nonnegative number defined by 

K(X Y) = V ( X ' F ) (i 3 ) 

y/V(X,X)V(Y,Y) 

if both V(X,X) and V(Y,Y) are strictly positive, and defined to be zero otherwise. 
For distributions with finite first moments, the distance correlation characterizes in- 
dependence in that < K(X, Y) < 1 with H(X, Y) = if and only if X and Y are 
independent. 

A crucial property of the distance correlation is that it is invariant under transfor- 
mations of the form 



(X, Y) i— )■ (ai + hdX, a 2 + b 2 C 2 Y), 



(1.4) 
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where a\ G M. p and a<i G IR 9 , h\ and b<i are nonzero real numbers, and the matrices 
G\ G IR pxp and C2 G IR 9 * 9 are orthogonal. However, the distance correlation fails to be 
invariant under the group of all invertible affine transformations of (X,Y), which led 
Szekely, et al. (2007, pp. 2784-2785) and Szekely and Rizzo (2009, pp. 1252-1253) to 
propose an affinely invariant sample version of the distance correlation. 

Adapting this proposal to the population setting, the affinely invariant distance co- 
variance between distributions X and Y with finite second moments can be introduced 
as the nonnegative number V(X, Y) defined by 



where and Sy are the respective population covariance matrices. The affinely 
invariant distance correlation between X and Y is the nonnegative number defined by 



if both V(X,X) and V(Y,Y) are strictly positive, and defined to be zero otherwise. 
In the sample versions proposed by Szekely, et al. (2007), the population quantities 
are replaced by their natural estimators. Clearly, the population affinely invariant 
distance correlation and its sample version are invariant under the group of invertible 
affine transformations, and in addition to satisfying this often-desirable group invariance 
property (Eaton, 1989), they inherit the desirable properties of the standard distance 
dependence measures. In particular, < TZ(X, Y) < 1 and, for populations with finite 
second moments and positive definite covariance matrices, 7Z{X, Y) = if and only if 
X and Y are independent. 

The remainder of the paper is organized as follows. In Section [2j we review the 
sample version of the affinely invariant distance correlation introduced by Szekely, et 
al. (2007), and we prove that the sample version is strongly consistent. In Section [3] 
we provide exact expressions for the affinely invariant distance correlation in the case 
of subvectors from a multivariate normal population of arbitrary dimension, thereby 
generalizing a result of Szekely, et al. (2007) in the bivariate case; our result is non- 
trivial, being derived using the theory of zonal polynomials and the hypergeometric 
functions of matrix argument, and it enables the explicit and efficient calculation of the 
affinely invariant distance correlation in the multivariate normal case. 

In Section [4] we study the behavior of the affinely invariant distance measures for 
subvectors of multivariate normal populations in limiting CclSCS clS the Frobenius norm 
of the cross-covariance matrix converges to the zero matrix, or as the dimensions of 
the subvectors converge to infinity. We expect that these results will motivate and 
provide the theoretical basis for many applications of distance correlation measures for 
high-dimensional data. 

As an illustration of our results, Section [5] considers time series of wind vectors at 
the Stateline wind energy center in Oregon and Washington; we shall derive the em- 
pirical auto and cross distance correlation functions between wind vectors at distinct 



V 2 (X,Y) = v 2 (s x 1/2 x,s~ 1/2 y), 



(1.5) 



K{X,Y) 



V(X,Y) 
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meteorological stations. In Section |6j we provide a discussion in which we make a case 
for the use of the distance correlation and the affinely invariant distance correlation, 
which we believe to be uniquely appealing and powerful multivariate measures of depen- 
dence. Finally, the paper closes with an Appendix in which we calculate the non-affine 
distance covariance and distance correlation for multivariate normal populations. 



2 The Sample Version of the Affinely Invariant Dis- 
tance Correlation 

In this section, which is written primarily to introduce readers to distance correlation 
measures, we describe sample versions of the affinely invariant distance covariance and 
distance correlation as introduced by Szekely, et al. (2007, pp. 2784-2785) and Szekely 
and Rizzo (2009, pp. 1252-1253). 

First, we review the sample versions of the standard distance covariance and distance 
correlation. Given a random sample (Xi, Yi), . . . , (X n , Y n ) from jointly distributed 
random vectors X G R p and Y G R q , we set 

X = . . . , X n ] G R pxn and Y — [Yi , . . . , Y n ] G R qxn . 

A natural way of introducing a sample version of the distance covariance is to let 



1 n 



n 

3=1 

be the corresponding empirical characteristic function, and to write fx( s ) = fx y( s > 0) 
and fy-if) — /xy(O)i) for the respective marginal empirical characteristic functions. 
The sample distance covariance then is the nonnegative number V n (X,Y) defined by 

■ , l/ly(^*)-/l(s)/yW| 2 
Vi(X,Y) = / ' , ,, „ dsdt, 



c p c q JRP+1 \S\p t g 



where c p is the constant given in (1.2). 

Szekely, et al. (2007), in a tour de force, showed that 



V 2 n (X,Y) = - 2 J2^iB kl , (2.1) 

kl=\ 



where 



-y n 1 n 1 n 

o-ki = \Xk — Xi\ p , afc. = — y aw, a. i = — y dki, a.. = — y an, 

n z — ' n * — ' n z — ' 

1=1 k=l hl=l 



and 



Am — &m ~ a k- — cl.i + a.., 
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and similarly for bki = \Yk — Yi\ q , t>k., b.i, b.., and Bki, where k, I = l,...,n. Thus, 
the squared sample distance covariance equals the average entry in the componentwise 
or Schur product of the centered distance matrices for the two variables. The sample 
distance correlation then is defined by 

U n (X, Y) = Vn(X,Y) 

y/V n (X,X)V n (Y,Y) 

if both V n (X, X) and V„(V, V) are strictly positive, and defined to be zero otherwise. 
Computer code for calculating these sample versions is available in an R package by 
Rizzo and Szekely (2011). 

Now let Sx and Sy denote the usual sample covariance matrices of the data X and 
Y, respectively. Following Szekely, et al. (2007, p. 2785) and Szekely and Rizzo (2009, 
p. 1253), the sample affinely invariant distance covariance is the nonnegative number 
V n (X,Y) defined by 

V 2 n (X,Y) = V 2 n (S x 1/2 X,S Y 1/2 Y) (2.3) 

if Sx and Sy are positive definite, and defined to be zero otherwise. The sample affinely 
invariant distance correlation is defined by 

n n (x, y) = v n {x,Y) _ {2A) 



^V n {X,X)V n {Y,Y) 



if the quantities in the numerator are strictly positive, and defined to be zero otherwise. 
The sample affinely invariant distance correlation inherits the properties of the sample 
distance correlation; in particular 

o<n n (x,Y) < i, 

and lZ n (X,Y) = 1 implies that p — q, that the linear spaces spanned by X and Y 
have full rank, and that there exist a vector a E W, a nonzero number b E R, and an 
orthogonal matrix C E R pxp such that S~ 1/2 Y = a + bCS x l/2 X. 

Our next result shows that the sample affinely invariant distance correlation is a 
consistent estimator of the respective population quantity. 

Theorem 2.1. Suppose that X = [X\ , . . . , X n ] E R pxn and Y = [Y u . . . , Y n ] E R qxn 
are random samples from jointly distributed variables (X, Y) E W +q with positive defi- 
nite population covariance matrices Sx E W xp and Sy E R qxq , respectively. Also, let 
Sx and Sy be strongly consistent estimators for Sx and Sy, respectively. Then 

V 2 n {£ x 1/2 X,% l/2 Y)^V 2 {X,Y), 

almost surely, as n — > oo. In particular, the sample affinely invariant distance correla- 
tion satisfies 

n n (X,Y)^n(X,Y), (2.5) 

almost surely. 
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Proof. As the covariance matrices Ex and Ey are positive definite, we may 
assume that the strongly consistent estimators and Sy also are positive definite. 
Therefore, in order to prove the first statement it suffices to show that 

V 2 (E X 1/2 X, % l/2 Y) - V 2 (E X 1/2 X, E y 1/2 Y) 0, (2.6) 

almost surely. By the decomposition of Szekely, et al. (2007, p. 2776, Equation (2.18)), 



the left-hand side of (2.6) can be written as an average of terms of the form 



|E x 1//2 (X fc — Xi)\ p |Ey- 1//2 (F fc — Ym)\ q — |S x 1/2 (X fc — Xi)\ p |S y 1 ^ 2 (Y fc — Ym)\ q - 
Using the identity 

|E x 1/2 (X fc — Xi)\ p \ Y±y/ 2 (Yk — y m )\ q 

= |(E X 1/2 - E x 1/2 + S x 1/2 )(X fc - X,)| p |(E y V2 - E y 1/2 + E y 1/2 )(n - Y m )\ q , 
we obtain 

\Hix^ 2 (Xk — X{)\ p |Ey 1//2 (yfc — Yrn)\ q — 2 {Xk — X{)\ \T ly 1 ^ 2 (Yk — Ym)\ q 

s~ 1 1 1/2 1/2 II 1 1 ^-i 1/2 yi— 1/2 1 1 I y- y I |y I 

— W^X ^X N W^Y Y II l-^k ■ A -l\p\ 1 k I m\q 

+ II ^X ^ ~ ^X ^ II \Xk — Xi\ p |Ey ^ (Yfc — ^m)L 

+ ||Ey 1/2 - E y 1/2 || |S x 1/2 (X fc - X t )\ p \Y k - r r 



where the matrix norm ||A|| is the largest eigenvalue of A in absolute value. Now we 
can separate the three sums in the decomposition of Szekely, et al. (2007, p. 2776, 
Equation (2.18)) and place the factors like HE^ 2 — E^^H in front of the sums, since 
they appear in every summand. Then, ||5Ty 2 — E^^H and HEy 1 ^ 2 — E]y 1//2 1| tend 
to zero and the remaining averages converge to constants (representing some distance 
correlation components) almost surely as n - > oo, and this completes the proof of 



the first statement. Finally, the property (2.5) of strong consistency of TZ n (X ,Y) is 



obtained immediately upon setting Ex = Sx and Ey = Sy- □ 

Szekely, et al. (2007, p. 2783) proposed a test for independence that is based on the 
sample distance correlation. From their results, we see that the asymptotic properties 
of the test statistic are not affected by the transition from the standard distance cor- 
relation to the aflinely invariant distance correlation. Hence, a completely analogous 
but different test can be stated in terms of the affinely invariant distance correlation. 
Noting the results of Kosorok (2009, Section 4), we raise the possibility that the spe- 
cific details can be devised in a judicious, data-dependent way so that the power of the 
test for independence increases when the transition is made to the aflinely invariant 
distance correlation. 
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3 The Affinely Invariant Distance Correlation for 
Multivariate Normal Populations 

We now consider the problem of calculating the affinely invariant distance correlation 
between the random vectors X and Y where (X,Y) ~ f\f p+q (p, E), a multivariate 
normal distribution with mean vector p, G M. p+q and covariance matrix E G ]R( p+<? ) x . 
We assume, without loss of generality that E is nonsingular; otherwise, the problem 
reduces to a calculation on a lower-dimensional space. 

For the case in which p = q = 1, i.e., the bivariate normal distribution, the problem 
was solved by Szekely, et al. (2007). In that case, the formula for the affinely invariant 
distance correlation depends only on p, the correlation coefficient, and appears in terms 
of the functions sin -1 p and (1 — p 2 ) 1 ^ 2 , both of which are well-known to be special cases 
of Gauss' hypergeometric series. Therefore, it is natural to expect that the general case 



will involve generalizations of Gauss' hypergeometric series, and Theorem |3.1| below 
demonstrates that such is indeed the case. To formulate this result, we need to recall 
the rudiments of the theory of zonal polynomials (Muirhead 1982, Chapter 7). 

A partition k is a vector of nonnegative integers (k±, . . . , k q ) such that k\ > ■ • • > k q . 
The integer \k\ = k\ + • • ■ + k q is called the weight of k; and £(k), the length of k, is the 
largest integer j such that kj > 0. The zonal polynomial C K (A) is a mapping from the 
class of symmetric matrices A G M. qxq to the real line which satisfies several properties, 
the following of which are crucial for our results: 

(a) Let 0(q) denote the group of orthogonal matrices in M> qxq . Then 

C K (K'AK) = C K (A) (3.1) 
for all K G 0(q); thus, C K (A) is a symmetric function of the eigenvalues of A. 

(b) C K (A) is homogeneous of degree \k\ in A: For any 8 G R, 

C K (8A) = 8^C K (A). (3.2) 

(c) If A is of rank r then C K (A) = whenever £(k) > r. 

(d) For any nonnegative integer k, 

]TC K (A) = (trA) fc . (3.3) 

\n\=k 



(e) For any symmetric matrices Ai,A2 G 



iqxq 



C K (K> A lK A 2 ) dK = a( ^ (A2) , (3.4) 

0(q) L>K\.lq) 

where I q = diag(l, . . . , 1) G M> qxq denotes the identity matrix and the integral is 
with respect to the Haar measure on 0(q), normalized to have total volume 1. 
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(f) Let Ai, . . . , X q be the eigenvalues of A. Then, for a partition (k) with one part, 

V 2 ;fe h+-+i q =k j=l r 

where the sum is over all nonnegative integers ii,...,i g such that \-i q = k, 

and 

. , T(a + k) , . , . 
{a) k = K —, r = a(a + l a + 2) • • • (a + k - 1), 
r(a) 

a G C, is standard notation for the rising factorial. In particular, on setting 
Xj — 1, j — 1, . . . , g, we obtain from (3.5) 

= (3-6) 

(Muirhead, 1982, p. 237, equation (18); Gross and Richards, 1987, p. 807, Lemma 
6.8). 

With these properties of the zonal polynomials, we are ready to state our key result 
which obtains an explicit formula for the afhnely invariant distance covariance in the 
case of a Gaussian population of arbitrary dimension and arbitrary positive definite 
covariance matrix. 

Theorem 3.1. Suppose that (X,Y) ~ Af p + q (fi, E), where 

£ _ ( Ex Exy 

\Ey_x Ey 

with E x G R pxp , E y G R qxq , and E X y G M pxg . Then 



A = E y 1/2 E yx E x x E X y E y 1/2 G R qxq . (3.8) 

Proof. We may assume, with no loss of generality, that fi is the zero vector. 
Since E is positive definite then Ex and Ey both are positive definite so the inverse 
square-roots, E x 1//2 and Sy 1 ^ 2 , exist. 

By considering the standardized variables X = S x ^ 2 X and Y = Ey 1 ^ 2 Y, we may 
replace the covariance matrix E by 

Ip Axy 

A X y' I q 

where 

Axy = E x Exy E y (3.9) 
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Once we have made these reductions, it follows that the matrix A in (3.8 ) can be written 
as A = Axy'A-xy an d that it has norm less than 1. Indeed, by the partial Iwasawa 
decomposition of S, viz., the identity, 











Axy' I J \ I„ - Axy'Axy 



Ip Axy 
L 



where the zero matrix of any dimension is denoted by 0, we see that the covariance 
matrix E is positive definite if and only if I q — A is positive definite. Hence, A < I q in 
the Loewner ordering and therefore ||A|| < 1. 

We proceed to calculate the distance covariance V(X, Y) = V(X,Y). It is well- 
known that the characteristic function of (X, Y) is 



/fy(s,i) = exp 



exp[-l(\s\ 2 p +\t\ 2 q + 2s'Ax Y t 



where s6K p and t G W . Therefore, 



|/x yM) - f x (s)f Y (t)\ 2 = (1 - exp(-s'AxYt)) 2 exp(-|s|J - \t\ 



and hence 

c p c q V 2 (X,Y) 



(l - exp(-s'A X yt)) exp(-|s|p - \t 
I (l - exp(s' Axyt)) 2 exp(-|s|p - \t\ 2 q 



ds dt 



|„|P+1 UI9+1 
I J IP I " 1 9 

ds dt 



\p+i uig+i' 



(3.10) 



where the latter integral is obtained by making the change of variables s \- > — s within 
the former integral. 

By a Taylor series expansion, we obtain 

(l - exp(s' A XY t)) 2 = 1 - 2exp(s'A XY t) + exp(2s'A xy t) 

00 nk _ o 



k=2 



Substituting this series into (3.10) and interchanging summation and integration, a 
procedure which is straightforward to verify by means of Fubini's theorem, and noting 
that the odd-order terms integrate to zero, we obtain 



2 2k _ 2 



RP+i 



To calculate, for k > 1, the integral 

/ (s'A X Yt) 2k exp(-\s\ 2 p -\t 

Jrp+i 



s'AxYt) 2k e W (-K ~ K) ~i~Tp~y[ "^T- (3.H) 

77 t 



ds dt 



ip+i uig+i 

V \l> 



(3.12) 
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we change variables to polar coordinates, putting s = r x 9 and t = r y 4> where r x , r y > 



9 = (9 X ,...,9 P )' G S p -\ and = <j> q )' G S q -\ Then the integral gig ) 

separates into a product of multiple integrals over (r x ,r y ), and over (9, 0), respectively. 
The integrals over r x and r y are standard gamma integrals, 

/*oo /*oo 

/ / rf- 2 rf- 2 exp(-^-rJ)dr.dr y = |[r(A;-i)] 2 = [(-i) fc ] 2 7r, (3.13) 
Jo Jo 

and the remaining factor is the integral 

(9'A XY <j)) 2k d9d<f), (3.14) 



59-1 J S p 



where d9 and d0 are unnormalized surface measures on S p 1 and S q 1 , respectively. 
By a standard invariance argument, 



! (9'vY k &9 = \v\f [ 9\ k d9, 



v G MP. Setting v = Axy<P an d applying some well-known properties of the surface 
measure d6, we obtain 



/ {9>A XY <P) 2k d9 = \A XY cf>\ 2 p k / 9 2k d9 
J sp- 1 J sp- 1 



r(fc + |)r(| P ) 



Therefore, in order to evaluate (3.14), it remains to evaluate 



4(A) = / (0'A0) fe d0. 
J si- 1 

Since the surface measure is invariant under transformation i— >• K<p, K G 0(g), it 
follows that 4(A) = J k (K'AK) for all G 0(g). Integrating with respect to the 
normalized Haar measure on the orthogonal group, we conclude that 

Jfe(A) = I Jk(K'AK) dK = [ [ (cf)'K'AK(f)) k dKd(j). (3.15) 

JO(q) J Si- 1 JO(q) 



We now use the properties of the zonal polynomials. By (3.3) 



b'K'AK(j)) k = (tr K'AK(fxf)') k = C^K'AKtjxf/) 

\n\=k 



therefore, by (3.4) 



O K (A)O K (00') 



/ ^'K'AK^f dK=J2f C K {K'AK4><j>') dK = V , , 

J0( 9 ) \ K \= k J O{0) lKl=k Wq) 
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Sinc e 00' is of rank 1 then, by property (c), C K ((/><f)') = if £(k) > 1; it now follows, by 
(3.3) and the fact that G S" 3 ' -1 , that 



a 



(to 



|«|=fc 



(tr00')* = (0* 



If 



1. 



Therefore, 



(0'iTA^) fc = £^ = #T C ^ A )' 



where the last equality follows by (|3.6|). Substituting this result at (3.15), we obtain 

C {k) (A). 



J fc (A) = 2c, 



9-1 /l 



J)* 



Collecting together these results, and using the well-known identity (2k)\ = k\ 2 
we obtain the representation (3.7), as desired. 



2k /1\ 
2 )ki 

□ 



We remark that by interchanging the roles of X and Y in Theorem 3.1 , we would 
obtain (3.7) with A in (3.8) replaced by 



A 



ppxp 



Since A and A have the same characteristic polynomial and hence the same set of 
nonzero eigenvalues, and noting that C K (A) depends only on the ei genv alues of A, it 
follows that C(fe)(A) = C(fc)(A ). Therefore, the series representation (3.7) for V 2 (X,Y) 
remains unchanged if the roles of X and Y are interchanged. 

The series appearing in Theorem 3.1 can be expressed in terms of the generalized 
hypergeometric functions of matrix argument (James, 1964; Muirhead, 1982; Gross and 
Richards, 1987). For this purpose, we introduce the partitional rising factorial for any 
a G C and any partition k — (k±, . . . , k q ) as 



or 



IK- 

3=1 



10- -i)), 



Let cki, pi, . . . , (3 m G C where — /3j + \{j — 1) is not a nonnegative integer, for 

alH = 1, . . . , m and j = 1, . . . , q. Then the iF m generalized hypergeometric function of 
matrix argument is defined as 



iF m (ax, . . . , atf, p u . . . , f3 m ; S) = ^ — ^ 



[ai) K ■ ■ ■ [ai) K 



C K {S), 



where S is a symmetric matrix. A complete analysis of the convergence properties of 
this series was derived by Gross and Richards (1987, p. 804, Theorem 6.3), and we refer 
the reader to that paper for the details. 
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Corollary 3.2. In the setting of Theorem 3.1, we have 



V 2 (X,Y)=4n Cp - lCq - 1 



(si^d, -h -b \V, §91 A) - 2 3 F 2 (|, -i, -§; ip, |g; JA) + l). (3.16) 



Proof. It is evident that 



, 2 )« 




if £(k) < 1, 
if £(/«) > 1. 



Therefore, we now can write the series in (3.7), up to a multiplicative constant, in terms 



of a generalized hypergeometric function of matrix argument, in that 



E 

k=i 



k\ 2 



2k 



H 2 M 2/ fe ^ / a \ 
1 W l x C (fc)( A J 



x 02A: _ 2 



k\2 2k ^ 



k=l 

00 



1 



\n\=k 



-C K (A) 



fe=l ' l«l=fc 



*C7 K (A)-2^ 



A;!2 2fc 

fe=l \ K \=k 



^ w c,(A) 



= [3^2 (1,-1.-1; b> a) - 1] - 2 [3^2 (|, -§,-*; b> i A ) - 1] . 

As proved by Gross and Richards (1987, p. 804, Theorem 6.3), the zonal polynomial se 
ries expansion for the 3 F 2 (|, — |, — |; |p 
matrix argument converges absolutely if 



ries expansion for the 3 F 2 (|, — |, — |; |p, |g; A) generalized hypergeometric function of 



at (3.16) for all positive definite E. 



< 1, and so we have absolute convergence 

□ 



Consider the case in which q — 1 and p is arbitrary. Then A is a scalar; say, A = p 2 



for some p G (—1,1). Then the 3 F 2 generalized hypergeometric functions in (3.16) each 
a 

V 2 (X,Y) =4- 



reduce to a Gaussian hypergeometric function, denoted by 2 -Pi, and (3.16) becomes 
1 c p-i 



» 2 1 V 2' 2' 2 



p;p 2 ) -2 2J P 1 (-| ) -i;|p;ip 2 )+l . 



For the case in which p = q = 1, we may identify p with the Pearson correlation coeffi- 
cient and the hypergeometric series can be expressed in terms of elementary functions. 
By well-known results (Andrews, Askey, and Roy (2000), pp. 64 and 94), 



.1 -I-I-fl* 

2 ' 2 ' 2 ' " j 



psin x p + (1 



-2\l/2 



(3.17) 



and thus we derive the same result for p = q = 1 as in Szekely, et al. (2007, p. 2786). 

For cases in which q = 1 and p is odd, we can again obtain explicit ex press ions for 
V 2 {X,Y). In such cases, the 3F2 generalized hypergeometric functions in (13. 16h reduce 
to Gaussian hypergeometric functions of the form 2 Fi(— |, — |; k + |; p 2 ), G N, and 
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it can be shown that these latter functions are expressible in closed form in terms of 
elementary functions and the sin _1 (-) function. For instance, for p = 3, the contiguous 
relations for the 2 -P\ functions can be used to show that 



2 J 2 ' 2 ' " J 



3(1 ~P 



2U/2 



l + 2p 2 )sin- 1 p 
4p~ 



(3.18) 



Further, by repeated application of the same contiguous relations, it can be shown that 



for k = 2,3,4,..., 



1 _i. 

'2' 2' 



k + l;p 2 ) = p-^\l - P Y 2 Pk-i(p 2 ) +p- (2fc - 1) g fc (p 2 )sin- 1 p, 



where and Qk are polynomials of degree k. Therefore, for q = 1 and p odd, the 
distance covariance V 2 (X, K) can be expressed in closed form in terms of elementary 
functions and the sin _1 (-) function. 

The appearance of the generalized hypergeometric functions of matrix argument 
also yields a useful expression for the affinely invariant distance variance. In order to 
state this result, we shall define for each positive integer p the quantity 



_ r(ip)r(i P + i) Lin,-. 



[r(l(p + i))]' 



(3.19) 



Corollary 3.3. In i/je setting of Theorem 3.1, we have 



V 2 (A,X)=4vr^A(p). 



(3.20) 



Proof. We are in the special case of Theorem |3. 1| for which X = Y, so that p = q 
and A = I p . By applying (3.6) we can write the series in (3.7) as 



47T 



C P-1 \^ 2 2fc — 2 (^)fc( 2M |)fe /r> 



"P fe=l 



47T 



(fp)/b(§p)fc 

.2 00 2fc 

-p— 1 i 



E 



2 (-|)fc(-|)fc 



"P fe=l 



A;!2 2fc 



^ ( (-§> -b b; i)- 1 ]- 2 fc* (-1. -5; I) - 1] )■ 



By Gauss' Theorem for hypergeometric functions the series 2-Fi(— 2' ~~ 2' 2^> z ) a ^ so 
converges for the special value 2 = 1, and then 

r(ip)r(ip + i) 



^i( 



1 _i. 1 

2' 2' 2 



p;l 



[r(|(P + i))] 



2 > 



thereby completing the proof. 



□ 
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0.1 0.2 0.3 04 0.5 0.6 0.7 0.8 0.9 1 



Figure 1: The affinely invariant distance correlation for subvectors of a multivariate 
normal population, where p = q = 2, as a function of the parameter r in three distinct 
settings. The solid diagonal line is the identity function and is provided to serve as a 
reference for the three distance correlation functions. See the text for details. 



For cases in which p is odd, we can proceed as explained at (3.18) to obtain explicit 



values for the Gaussian hypergeometric function remaining in ( |3.20 ). This leads in such 



cases to explicit expressions for the exact value of V (X, X). In particular, if p — 1 



then it follows from (1.2) and (3.17) that 



V 2 (X,X) 



4 4(\/3-i; 



7T 



and for p = 3, we deduce from (1.2) and (3.18) that 



V 2 (X,X) = 2- 



4(3^-4) 



7T 



Corollaries 3.2 and 3.3 enable the explicit and efficient calculation of the affinely 



invariant distance correlation (1.6) in the case of subvectors of a multivariate normal 



population. In doing so, we use the algorithm of Koev and Edelman (2006) to evaluate 
the generalized hypergeometric function of matrix argument, with C and Matlab code 
being available at these authors' websites. 

Figure [T] concerns the case p = q = 2 in various settings, in which the matrix A22 
depends on a single parameter r only The dotted line shows the affinely invariant 
distance correlation when 

'0 N 
.0 



A 



XY 
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Figure 2: The affinely invariant distance correlation between the p- and q- dimensional 
subvectors of a (p+g)-dimensional multivariate normal population, where (a) p = q = 2 
and Axy = diag(r, s), and (b) p — 2, q — 1 and Axy = (r, s)'. 

this is the case with the weakest dependence considered here. The dash-dotted line 
applies when 

The strongest dependence corresponds to the dashed line, which shows the affinely 
invariant distance correlation when 

in this case we need to assume that < r < | in order to retain positive definiteness. 

In Figure [2j panel (a) shows the affinely invariant distance correlation when p = 
q = 2 and 

A -=(o % 

where < r, s < 1. With reference to Figure [TJ the margins correspond to the dotted 
line and the diagonal corresponds to the dash-dotted line. 

Panel (b) of Figure [2] concerns the case in which p — 2, q — 1 and Axy — ( r ? s)', 
where r 2 + s 2 < 1. Here, the affinely invariant distance correlation attains an upper 
limit as r 2 + s 2 "\ 1, and we have evaluated that limit numerically as 0.8252. 

4 Limit Theorems 

We now study the limiting behavior of the affinely invariant distance correlation mea- 
sures for subvectors of multivariate normal populations. 
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Our first result quantifies the asymptotic decay of the affinely invariant distance 
correlation in the case in which the cross-covariance matrix converges to the zero matrix, 
in that 

tr(A) = ||A xy |||— >0, 
where II ■ II ^ denotes the Frobenius norm, and the matrices A = Axy'A-xy and A^y are 



defined in (3.8) and (3.9), respectively. 



Theorem 4.1. Suppose that (X,Y) ~ J\f p+q (/i, E) ; where 



with Ex € lR pxp and Xy G ]R <?X|? being positive definite, and suppose that the matrix A 



in (3.8) has positive trace. Then, 



lim 



K 2 (X,Y) 



1 



tr(A)-+o tr(A) Apq^jA(p)A(q) 



(4.1) 



where A(p) is defined in (3.19) 



Proof. We first note that V 2 (X, X) and V 2 (Y, Y) do not depend on S X y, as can 



be seen from their explicit representations in terms of A(p) and A(q) given in (3.20). 
In studying the asymptotic behavior of V 2 (X, Y), we may interchange the limit and 



the summation in the series representation (3.7). Hence, it suffices to find the limit 
term-by-term. Since Cm (A) = tr (A) then the ratio of the term for k — 1 and tr (A) 
equals 

Cp-l C q -i 7T 

c p c q pq 

For k > 2, it follows from (3.5) that C(k)(A) is a sum of monomials in the eigenvalues 



of A, with each monomial being of degree k, which is greater than the degree, viz. 1, 
of tr (A); therefore, 



lim 



C W (A) 



lim 



C W (A) 



tr(A)->o tr(A) a^o tr(A) 



0. 



Collecting these facts together, we obtain (4.1). 



□ 



If p = q = 1 we are in the situation of Theorem 7(iii) in Szekely, et al. (2007). 



Applying the identity (3.17), we obtain 



and (tr (A)) 



1/2 



2-^1 

\p\. Thus we obtain 



i _ i. i. i^ 

2' 2' 2' 4' 



7T 

12 ~2~' 



lim 



n{x,Y) 
\p\ 



2 (1 + |tt — V3 



1/2' 
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as shown by Szekely, et al. (2007, p. 2785). 

In the remainder of this section we consider situations in which one or both of the 
dimensions p and q grow without bound. We will repeatedly make use of the fact that, 
with c p defined as in (1.2), 

(4.2) 



Cp-l 



as p — > oo, which follows easily from the functional equation for the gamma function 
along with Stirling's formula. 

Theorem 4.2. For each positive integer p, suppose that {X p , Y p ) ~ A/2 P (/i p , £ p ), where 



XY,p 
Y, P 



with Ejp G 



Then 



and 



ppxp 



and Sy; p G 
A., 



ppxp 



being positive definite and such that 



y-l/2 V 



lim f , 

p->oo tr (A p 

lim — 

p^oo tr (A p J 



V 2 (X P ,F P 



S^Y 2 ^ 0. 
1 



H 2 {.X p) Yp) 



(4.3) 
(4.4) 



In particular, if A p = r I p for some r G [0, 1], then tr (A p ) = r p, and so (4.3) and 

1 o 



(4.4) reduce to 



lim V 2 (X p ,Y p ) = -r~ 



p— ¥ OO 



and lim 1Z(X P , Y pl 



p—>oo 



respectively. The following corollary concerns the special case in which r = 1; we state 
it separately for emphasis. 

Corollary 4.3. For each positive integer p, suppose that X p ~ Af p (fi p ,T, p ) , with E p 
being positive definite. Then 

1 



lim V 2 (X p ,X p 



p — ¥ OO 



(4.5) 



Proof of Theorem |4.2| and Corollary |4.3[ In order to prove ( |4.3[ ) we study 
the limit for the terms corresponding separately to k — 1, k — 2, and k > 3 in (3.7). 

For k — 1, on recalling that C(i)(A p ) = tr (A p ), it follows from (4.2) that the ratio 
of that term to tr (A p )/p tends to 1/2. 

For k = 2, we first deduce from (3.3) that C( 2 )(A P ) < (tr A p ) 2 . Moreover, tr (A p ) < p 
because A p < I p in the Loewner ordering. Thus, the ratio of the second term in (3.7) 
to tr (A p )/p is a constant multiple of 
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which, by (4.2), converges to zero as p — > oo. 

Finally, suppose that k > 3. Obv iously, A p < ||A P ||/ P in the Loewner ordering 
inequality, and so it follows from (3.5) that C(fe)(A p ) < ||A p || fe C{k)(I p ). Also, since 
tr (A p ) > ||A P || then by again applying the Loewner ordering inequality and (3.6) we 
obtain 



C {k) (A p ) ^ \\A p \\ k C (k) (I p ) 



tr (A„) 



< 



l|A ? 



Ap|| C(k)(I p ) < C( fc )(/p) 



(4.6) 



Therefore, 



An 



V S-i 



2 00 2fc 



tr(A„) cl ^ 



E 



2 (o)fc (-o)fe ( — 2)* 



C(fc)(A p ) 



< 47T p 



-'p—l 



E 



> 2 *-2(-i) t (-i) t 



P fc=3 



k\2 2k 



\\p)k 



By (4.2), each term pc p _ 1 /(^p)kC p converges to zero as p — > oo, and this proves both 
(4.3 and its special case, (4.5). Then, (4.4) follows immediately. □ 



Finally, we consider the situation in which q, the dimension of Y, is fixed while p, 
the dimension of X, grows without bound. 

Theorem 4.4. For each positive integer p, suppose that (X p , Y) ~ J\f p+q (n p , E p ) 7 where 

pxp and Sy G M. gxq being positive definite and such that 
A p = Sy 1 ^ 2 £yx, P ^x] P ^xy,p ^y 1 ^ 2 7^ 0- 



with T,x p G 



Then 



and 



^ V 2 (X P1 Y) 



lim 

p-s>oo tr (A p 



2 qc q 
1 

WW 



(4.7) 
(4.8) 



Proof. By (3.7), 



V 2 (Xp,E) =4vr 



E 



2 2fc -2(i) fc (-^) fe 



Cp c? ^ fc!2 2fc (ip),(|g) fc 



C(fe)(Ap). 



We now examine the limiting behavior, as p — > oo, of the terms in this sum for k = 1 
and, separately, for k > 2. 
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For k = 1, the limiting value of the ratio of the corresponding term to tr(A p )/ A /p 
equals 



7r - lim 



VP c : 



p— l 



qc q p^oo tr (A p ) pc p 
by ([42]) and the fact that C ( i)(A p ) = tr (A p ). 



C(i)(A P ) 



7T C, 



9-1 



2 gc„ 



For k > 2, the ratio of the sum to tr (k p )/ ^fp equals 



47T 



tr(A p ) c p c g ^ k\2 2k 



E 



fc=2 



C(fc)(Ap) 



< 47T 



I -^-ti 1 1 Aj ! 2 

Pll P 9 fc=2 



E 



-y/p Cp_i c g _i ^ 2 2fc — 2 



<4:7T y/j) 



Cp-1 Cg_l 2 2fc — 2 ( 

^T^Tf^ A;!2 2fc " 

k=2 



\v)k 



I A, 



where we have used (4.6) to obtain the last two inequalities. By applying (4.2), we see 



that the latter upper bound converges to as p — > oo, which proves (4.7), and then 
(4.8) follows immediately. □ 



The results in this section have practical implications for affine distance correlation 
analysis of large-sample, high-dimensional Gaussian data. In the setting of Theorem 



4.4, tr (Ap) < q is bounded, and so 



lim K(X P , Y) = 0. 

p — > oo 



As a consequence of Theorem 2.1 on the consistency of sample measures, it follows that 
the direct calculation of affine distance correlation measures for such data will return 
values which are virtually zero. In practice, in order to obtain values of the sample affine 
distance correlation measures which permit statistical inference, it will be necessary to 
calculate A p , the maximum likelihood estimator of A p , and then to rescale the distance 



correlation measures with the factor y/p/tr (A p ). In the scenario of Theorem 4.2 the 
asymptotic behavior of the affine distance correlation measures depends on the ratio 
p/tr(Ap); and as tr (A p ) can attain any value in the interval [0,p], a wide range of 
asymptotic rates of convergence is conceivable. 



In all these settings, the series representation (3.7) can be used to obtain complete 
asymptotic expansions in powers of p^ 1 or q~ 
correlation measures, as p or q tend to infinity. 



of the affine distance covariance or 



5 Time Series of Wind Vectors at the Stateline Wind 
Energy Center 

Remillard (2009) proposed the use of the distance correlation to explore nonlinear 
dependencies in time series data. Zhou (2012) pursued this approach recently and 
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defined the auto distance covariance function and the auto distance correlation function, 
along with natural sample versions, for a strongly stationary vector-valued time series, 
say i.Y ; )/ x . 

It is straightforward to extend these notions to the affinely invariant distance cor- 
relation. Thus, for an integer k, we refer to 

K x (k) = V ^ X ^ (5.1) 
ViX^Xj) 

as the affinely invariant auto distance correlation at the lag k. Similarly, given jointly 
strongly stationary, vector- valued time series (Xj)^_ 00 and (Yj)'jL_ 00 , we refer to 

n x , Y (k) = , = (5.2) 



V(X^X 3 )V(Y V Y 3 ) 

as the affinely invariant cross distance correlation at the lag k. The corresponding 
sample versions can be defined in the natural way, as in the case of the non-affine 
distance correlation (Zhou, 2012). 

We illustrate these concepts on time series data of wind observations at and near 
the Stateline wind energy center in the Pacific Northwest of the United States. Specif- 
ically, we consider time series of bivariate wind vectors at the meteorological towers at 
Vansycle, right at the Stateline wind farm at the border of the states of Washington 
and Oregon, and at Goodnoe Hills, 146 km west of Vansycle along the Columbia River 
Gorge. Further information can be found in the paper by Gneiting, et al. (2006), who 
developed a regime-switching space-time (RST) technique for 2-hour-ahead forecasts 
of hourly average wind speed at the Stateline wind energy center, which was then the 
largest wind farm globally. For our purposes, we follow Hering and Genton (2010) in 
studying the time series at the original 10-minute resolution, and we restrict our anal- 
ysis to the longest continuous record, the 75-day interval from August 14 to October 
28, 2002. 

Thus, we consider time series of bivariate wind vectors over 10, 800 consecutive 10- 
minute intervals. We write V^ N and V^ EW to denote the north-south and the east-west 
component of the wind vector at Vansycle at time j, with positive values corresponding 
to northerly and easterly winds. Similarly, we write G^ s and G EW for the north-south 
and the east-west component of the wind vector at Goodnoe Hills at time j, respectively. 

Figure [3] shows the classical (Pearson) sample auto and cross correlation functions 
for the four univariate time series. The auto correlation functions generally decay with 
the temporal, but do so non-monotonously, due to the presence of a diurnal component. 
The cross correlation functions between the wind vector components at Vansycle and 
Goodnoe Hills show remarkable asymmetries and peak at positive lags, due to the pre- 
vailing westerly and southwesterly wind (Gneiting, et al. 2006). In another interesting 
feature, the cross correlations between the north-south and east-west components at 
lag zero are strongly positive, documenting the dominance of southwesterly winds. 

Figure [| shows the sample auto and cross distance correlation functions for the 
four time series; as these variables are univariate, there is no distinction between the 
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Figure 3: Sample auto and cross Pearson correlation functions for the univariate time 
series V^ EW , V^ NS , G EW , and G^ s , respectively. Positive lags indicate observations at 
the westerly site (Goodnoe Hills) leading those at the easterly site (Vansycle), or ob- 
servations of the north-south component leading those of the east-west component, in 
units of hours. 

standard and the affinely invariant version of the distance correlation. The patterns 
seen resemble those in the case of the Pearson correlation. For comparison, we also 
display values of the distance correlation based on the sample Pearson correlations 
shown in Figure [3j and converted to distance correlation under the assumption of 
bivariate Gaussianity, using the results of Szekely, et al. (2007, p. 2786) and Section [3] 
in every single case, these values are smaller than the original ones. 

Having considered the univariate time series setting, it is natural and complemen- 
tary to look at the wind vector time series (V^ EW , V^ NS ) at Vansycle and (G EW , G^ s ) at 
Goodnoe Hills from a genuinely multivariate perspective. To this end, Figure [5] shows 
the sample affinely invariant auto and cross distance correlation functions for the bi- 
variate wind vector series at the two sites. Again, a diurnal component is visible, and 
there is a remarkable asymmetry in the cross-correlation functions, which peak at lags 
of about two to three hours. 

In light of our analytical results in Section |3j we can compute the affinely invariant 
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Figure 4: Sample auto and cross distance correlation functions for the univariate time 
series V^ EW , V^ N , G EW , and G^ s , respectively. For comparison, we also display, in grey, 
the values that arise when the sample Pearson correlations in Figure [3] are converted 
to distance correlation under the assumption of Gaussianity; these values generally are 
smaller than the original ones. Positive lags indicate observations at Goodnoe Hills 
leading those at Vansycle, or observations of the north-south component leading those 
of the east- west component, in units of hours. 

distance correlation between subvectors of a multivariate normally distributed random 
vector. In particular, we can compute the affinely invariant auto and cross distance cor- 
relation between bivariate subvectors of a 4-variate Gaussian process with Pearson auto 
and cross correlations as shown in Figure [3j In Figure |5j values of the affinely invariant 
distance correlation that have been derived from Pearson correlations in these ways are 
shown in grey; the differences from those values that are computed directly from the 
data are substantial, with the converted values being smaller, possibly suggesting that 
assumptions of Gaussianity may not be appropriate for this particular data set. 

We wish to emphasize that our study is purely exploratory: it is provided for 
illustrative purposes and to serve as a basic example. In future work, the approach 
hinted at here may have the potential to be developed into parametric bootstrap tests 
for Gaussianity. Following the pioneering work of Zhou (2012), the distance correlation 
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Figure 5: Sample auto and cross affinely invariant distance correlation functions for the 
bivariate time series {Vf w , ^ NS )' and (Gf w , Gf s )' at Vansycle and Goodnoe Hills. For 
comparison, we also display, in grey, the values that are generated when the Pearson 
correlation in Figure [3] is converted to the affinely invariant distance correlation under 
the assumption of Gaussianity; these converted values generally are smaller than the 
original ones. Positive lags indicate observations at Goodnoe Hills leading those at 
Vansycle, in units of hours. 

may indeed find a wealth of applications in exploratory and inferential problems for 
time series data. 

6 Discussion 

In this paper, we have studied an affinely invariant version of the distance correlation 
measure introduced by Szekely, et al. (2007) and Szekely and Rizzo (2009) in both 
population and sample settings (see Szekely and Rizzo (2012) for further aspects of 
the role of invariance in properties of distance correlation measures). The affinely 
invariant distance correlation shares the desirable properties of the standard version of 
the distance correlation and equals the latter in the univariate case. In the multivariate 
case, the affinely invariant distance correlation remains unchanged under invertible 
affine transformations, unlike the standard version, which is preserved under orthogonal 
transformations only. Furthermore, the affinely invariant distance correlation admits an 
exact and readily computable expression in the case of subvectors from a multivariate 
normal population. As we show in the Appendix, the standard version allows for a 
series expansion, too, but this does not appear to be a series that generally can be 
made simple, and further research will be necessary to make it accessible to efficient 
numerical computation. 

Competing measures of dependence also have featured prominently recently (Reshef, 
et al. 2011; Speed, 2011). However, those measures are restricted to univariate settings, 
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and claims of superior performance in exploratory data analysis have been disputed 
(Gorfine, Heller and Heller, 2012; Simon and Tibshirani, 2012). We therefore opine 
with Newton (2009) that the distance correlation and the affinely invariant distance 
correlation might become uniquely useful, and potentially the predominant, measures 
of dependence and associations for the 21st century A potential drawback for large 
data sets is the computational cost required to compute the sample distance covariance, 
and the development of computationally efficient algorithms or subsampling techniques 
for doing this is highly desirable. 



Appendix: The Distance Covariance and Distance 
Correlation for Multivariate Normal Populations 



In Theorem |3.1| and Corollary |3.2| we calculated the affinely invariant distance covari- 
ance for multivariate normal populations. Here, we consider the problem of deriving a 
formula for the standard distance covariance and distance correlation. 



We first consider the case in which Ex and £y are scalar matrices, say, £x = cr x I t 

2 
V 



V 



and Sy = a y Ig with a x , a y > 0. Thus, suppose that (X, Y) ~ A^, +(? (/i, £), where 
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Sxy\ 




Sy J 




\Syx 





Putting A = SyxSxy, we follow the proofs of Theorem |3.1| and Corollary |3.2| to obtain 

2 (a)k (~Ti)k ( — o)fc 1 



V 2 (X,Y) 



x > 2 2fc 



h)k(h)k 



i'2A- 



71^) (A) 



Cp Cq 



Itt^^^I [ 3 F 2 (i -i.-i^p.igjA/^;)-!] 



-2 [ 3 F 2 (l -i -|; |p, \q- A/Aa 2 x a 2 y ) - l] Y 



Next we reduce the general case to the scalar case above. By making a diagonal 
transformation of the form (1.4) we see that we may assume, without loss of generality, 
that Sx and £y are diagonal matrices. Now denote by a 2 and a 2 the smallest eigen- 
values of Sx and £y, respectively. Also, let Ax = £x — <J 2 I P and Ay = Ey — a 2 I q ; 
then, Ex = Ax + cr 2 I p and £y = Ay + a 2 I q . Substituting these decompositions into 
the integral which defines V 2 (X, Y), we obtain 



/ (1 - exp(s'Sxyt)) 2 exp(-s'SxS - t'Syt) — ^ 
= / (1 — exp(s'Sxyt)) 2 exp(— s'A x s — t'A Y t) exp(— (J 2 x \s\ 2 p — a 2 \t 

JRp+1 
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ds dt 
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Next, we apply a Taylor expansion, 



oo ok 



(1 - exp( S '£ XK t)) 2 = ^J^(^x Y t) k 



k\ 

k=2 



and, writing Ax = diag(A x i, . . . , X xp ), we have 

00 C-iV 



U 

1=0 

00 / „ w 



Similarly, on writing Ay = diag(A y i, . . . , \ yq ), we obtain 

m=0 miH Vm q =m ^ '9/ j =1 

Integrating these series term-by-term, we find that the typical integral to be evaluated 
is 

./Rp+9 _•_ 1 Sj 



=1 j=l 



q+1- 



By the substitution t — £, we find that this integral vanishes if k is odd, and so we 
need to calculate 



/ 



2 ds dt 



s'^ X ytr n *t n *r* - 



^ i=i j=i 



We transform to polar coordinates s = r x 9 and t = r y 0, where r x ,r y > 0, 9 E S p 1 , 
and G S" 3 ' -1 . Then the integrals over r x and are standard gamma integrals: 

f°° Z" 00 2fc+2i.-2 2fc+2m.-2 /_ 2 2 _ 2 2x j j _ F(fc + I. - |) T(fc + m. - 
JO JO °y 

where /. — h + • • • + l p and to. = TOi + • • • + m q . As for the integrals over 6 and 0, 
they are 



i=i j=i 

2fc 



To evaluate these integrals, we expand (#'£xy0) using the multinomial theorem, ob- 
taining a sum of terms, each of which is homogeneous in 9 and 0. Then we integrate 
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term-by-term by transforming the surface measures d8 and d0 to Euler angles (Ander- 
son, 2003, pp. 285-286). The outcome is a multiple series expansion for the distance 
covariance. It does not appear to be a series that can be made simple in the general 
case, but it does provide an explicit expression in terms of E, p, and q. 

Although we chose a 2 x and a 2 to be the smallest eigenvalues of Ex and Ey, respec- 
tively, we could have chosen them to be any positive numbers. This is reminiscent of 
the comprehensive work of Kotz, Johnson, and Boyd (1967a, 1967b) on the distribution 
of positive definite quadratic forms in normal variables. Bearing in mind those results, 
it seems likely that an optimal choice for a 2 x and a 2 will be close to the arithmetic, 
geometric, or harmonic mean of the eigenvalues of Ex and Ey, respectively. At least, 
the issue of optimal choices for a 2 and a 2 that will accelerate the convergence of the 
above series is worthy of further investigation. 

Finally, we note that our techniques allow for similar explicit expressions in the case 
of the a-distance dependence measures described by Szekely, et al. (2007, p. 2784) and 
Szekely and Rizzo (2009, pp. 1251-1252; 2012, p. 2282). 
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